Raster Data Processing

Advanced Geospatial Data Processing for Social Scientists

Dennis Abel & Stefan Jünger

2025-04-28

Day Time Title
April 28 10:00-11:15 Introduction
April 28 11:15-11:30 Coffee Break
April 28 11:30-13:00 Raster data in R
April 28 13:00-14:00 Lunch Break
April 28 14:00-15:15 Raster data processing
April 28 15:15-15:30 Coffee Break
April 28 15:30-17:00 Graphical display of raster data in maps
April 29 10:00-11:15 Datacube processing I
April 29 11:15-11:30 Coffee Break
April 29 11:30-13:00 Datacube processing II & API access
April 29 13:00-14:00 Lunch Break
April 29 14:00-15:15 Data integration and linking (with survey data)
April 29 15:15-15:30 Coffee Break
April 29 15:30-17:00 Outlook and open session with own application

Our plan

Thus far, we have learned about

  • Different data formats
  • How to load them
  • First steps in interacting with them

In this session, you will learn

  • How to wrangle raster data even further
  • Linking to vector data
  • Manipulating the raster values
  • How to converse from one format into the other

Subsetting

Cropping raster data

Cropping is a method of cutting out a specific ‘slice’ of a raster layer based on an input dataset or geospatial extent, such as a bounding box. We often do this to ‘zoom’ in on a dataset or to make our computations more efficient. Let’s pretend we are mainly interested in the small-scale population size of Eastern Germany. For this purpose, we can use the bounding box of Eastern Germany.

Bounding box in R

The easiest way (in my opinion) to create a bounding box in R is to use the sf::st_bbox() function, possibly based on another geospatial dataset.

bbox_east <- 
  sf::st_read("./data/VG250_LAN.shp") |> 
  dplyr::filter(SN_L %in% 11:16, GF == 4) |> 
  sf::st_transform(25832) |> 
  sf::st_bbox() |> 
  sf::st_as_sfc(crs = 25832)

bbox_east
Reading layer `VG250_LAN' from data source 
  `C:\Users\mueller2\a_talks_presentations\advanced_geospatial\data\VG250_LAN.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 35 features and 23 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6106244
Projected CRS: ETRS89 / UTM zone 32N
Geometry set for 1 feature 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 562009.3 ymin: 5562814 xmax: 921292.4 ymax: 6068746
Projected CRS: ETRS89 / UTM zone 32N

Cropping the Census data on population numbers

Now, cropping is easy using the terra::crop() function.

pop_grid_2022 <- terra::rast("./data/z22/population.tif")

pop_grid_2022_crop <- terra::crop(pop_grid_2022, bbox_east)

terra::ext(pop_grid_ger_2020)
terra::ext(pop_grid_ger_2020_crop)
SpatExtent : 274144.445273998, 926449.115580001, 5236495.3402849, 6104900.94431191 (xmin, xmax, ymin, ymax)
SpatExtent : 562279.023568674, 921446.779151274, 5562647.6754379, 6068884.12202507 (xmin, xmax, ymin, ymax)

Note the different spatial extents.

Plotting the different extents

terra::plot(pop_grid_2022)

terra::plot(pop_grid_2022_crop)

Wait a minute…

…does these data only include Eastern German states? Let’s have a look at their shapes.

east_german_states <- 
  sf::st_read(
    "./data/VG250_LAN.shp",
    quiet = TRUE
  ) |> 
  dplyr::filter(SN_L %in% 11:16, GF == 4)

plot(pop_grid_ger_2020_crop)
plot(
  sf::st_geometry(east_german_states), 
  border = "white", 
  col = scales::alpha("white", .2), 
  lwd = 4, 
  add = TRUE
)

First solution: terra::mask()

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

pop_grid_2022_mask <- 
  terra::mask(
    pop_grid_2022, 
    terra::vect(east_german_states) # !
  )

plot(pop_grid_2022_mask)

Second and best solution: combining masking and cropping

pop_grid_2022_crop <- 
  terra::mask(
    pop_grid_2022, 
    terra::vect(east_german_states) # !
  ) |> 
  terra::crop(east_german_states)

plot(pop_grid_2022_crop)

Exercise 3_1: Subsetting Raster Data

Extraction & Aggregation

Changes in terminology

If we only want to add one attribute from a vector dataset Y to another vector dataset X, we can conduct a spatial join using sf::st_join() as shown earlier. In the raster data world, these operations are called raster extractions.

Raster data are helpful when we aim to

  • Apply calculations that are the same for all geometries in the dataset
  • Extract information from the raster fast and efficient

Pulling in some data

For this effort, we re-import the synthetic survey geocoordinates. We sample 100 geocoordinates from the whole dataset, as we only need a few to demonstrate the procedures that are followed.

points <- 
  readRDS(
    "./data/synthetic_survey_coordinates.rds"
  ) |> 
  dplyr::sample_n(size = 100) |> 
  sf::st_transform(25832)

points
Simple feature collection with 100 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 297016.6 ymin: 5303745 xmax: 835601.9 ymax: 6038499
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 100 × 3
      id           geometry foreigner
 * <int>        <POINT [m]>     <int>
 1  1206 (706315.1 5675895)         0
 2  1453 (616273.7 5678678)         0
 3   594 (835601.9 5661613)         1
 4   375 (412261.7 5313919)         0
 5   910 (786815.3 5641956)         0
 6  1636 (412194.7 5318920)         0
 7  1991 (712105.4 5309832)         0
 8   758 (669045.5 5621391)         0
 9   745 (686947.4 5702637)         0
10  1383   (458020 5404571)         0
# ℹ 90 more rows

Raster extraction

We use the following to extract the raster values at a specific point by location.

terra::extract(pop_grid_2022, points, ID = FALSE) |> 
  head(10)
       cat_0
1  2321.6440
2   723.8215
3  3106.7356
4  4879.3154
5  2363.5681
6  4449.1084
7  1214.7557
8   173.1610
9   376.0475
10        NA

Add results to existing dataset

This information can be added to an existing dataset (our points in this example).

points$pop <- 
  terra::extract(pop_grid_2022, points, ID = FALSE)[[1]]

points
Simple feature collection with 100 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 297016.6 ymin: 5303745 xmax: 835601.9 ymax: 6038499
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 100 × 4
      id           geometry foreigner   pop
 * <int>        <POINT [m]>     <int> <dbl>
 1  1206 (706315.1 5675895)         0 2322.
 2  1453 (616273.7 5678678)         0  724.
 3   594 (835601.9 5661613)         1 3107.
 4   375 (412261.7 5313919)         0 4879.
 5   910 (786815.3 5641956)         0 2364.
 6  1636 (412194.7 5318920)         0 4449.
 7  1991 (712105.4 5309832)         0 1215.
 8   758 (669045.5 5621391)         0  173.
 9   745 (686947.4 5702637)         0  376.
10  1383   (458020 5404571)         0   NA 
# ℹ 90 more rows

More elaborated: spatial buffers

Sometimes, extracting information 1:1 is not enough.

  • It’s too narrow
  • There is missing information about the surroundings of a point

Buffer extraction

We can use spatial buffers of different sizes to extract information about our surroundings.

terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = mean,
  na.rm = TRUE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1  1240.2739
2   315.6907
3  2986.1965
4  3044.9062
5   623.9542
6  4101.2633
7   428.7217
8   133.1527
9   236.1438
10  422.3046
terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 5000),
  fun = mean,
  na.rm = TRUE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1   640.9744
2   203.0444
3  2879.6138
4  2830.0767
5   341.3598
6  3241.5910
7   182.7296
8   274.9478
9   196.3779
10  395.9871

Extraction toggles: touches

There’s a multitude of arguments that we can adjust to conduct the extraction. An important option is from which raster cells we want our extraction done. Here’s an example of how the argument terra::extract(..., touches = FALSE/TRUE) works.

touches = FALSE vs. touches = TRUE

Let’s see how the values differ when we apply the option or not.

terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = mean,
  na.rm = TRUE,
  touches = FALSE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1  1240.2739
2   315.6907
3  2986.1965
4  3044.9062
5   623.9542
6  4101.2633
7   428.7217
8   133.1527
9   236.1438
10  422.3046
terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = mean,
  na.rm = TRUE,
  touches = TRUE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1  1051.0779
2   242.8127
3  2899.3292
4  2756.0768
5   514.6613
6  4330.3823
7   331.6084
8   134.8928
9   242.8360
10  380.4303

Extraction function

Often, we default to the mean of raster cell values to be extracted. In our example, calculating the sum is more relevant as we deal with population counts. Or the maximum?

terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = sum,
  na.rm = TRUE,
  touches = FALSE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1  22324.930
2   2841.217
3  56737.733
4  54808.312
5   9359.312
6  69721.477
7   7288.269
8   1730.985
9   3306.013
10  4223.046
terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = max,
  na.rm = TRUE,
  touches = TRUE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1  2637.6086
2   723.8215
3  6575.6094
4  9204.9951
5  2502.1909
6  9900.1660
7  1214.7557
8   317.5967
9   496.4330
10  891.1635

Custom functions

We can even define custom functions.

terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = function (x) {max(x, na.rm = TRUE) / sum(x, na.rm = TRUE)},
  touches = FALSE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1  0.1181463
2  0.2547576
3  0.1069620
4  0.1679489
5  0.2673477
6  0.1419959
7  0.1666727
8  0.1834775
9  0.1501606
10 0.2110239

Raster aggregation

We can use the same procedure to aggregate a raster dataset into a vector polygon dataset. That’s a widespread use case. Let’s load our German districts vector dataset in ./data/. This time, we will use WorldPop raster data on gender ratios from 2020.

german_districts <- 
  sf::st_read(
    "./data/VG250_KRS.shp",
    quiet = TRUE
  ) |> 
  sf::st_transform(3035)

gender_ratios_2020 <-
  terra::rast("./data/gender_ratio_2020.tif")

plot(gender_ratios_2020)
plot(
  sf::st_geometry(german_districts), 
  border = "white", 
  col = NA, 
  lwd = .5, 
  add = TRUE
)

Adding the aggregated data

Again, we use terra::extract() to create the aggregated data, which we can add to our vector dataset.

german_districts$gender_ratios_2020 <-
  terra::extract(
    gender_ratios_2020, 
    german_districts, 
    fun = mean, 
    na.rm = TRUE, 
    ID = FALSE
  ) |> 
  unlist()

plot(german_districts["gender_ratios_2020"])

Manipulating the raster data

Digging deeper

The previous steps are efforts that work on the raw raster cell values data. However, there are occasions where we might want to work on the raster data themselves or pre-process them to add them to another dataset (e.g., a vector file). Some of these procedures are for visualization, such as heat maps, and others are necessary for our later analyses. We will show a few in the following.

Creating a quick ‘heat map’

Population counts for the whole of Germany help us to identify urban and rural clusters. But there may be applications where we need to zoom in on a specific city to identify within-city variations regarding specific attributes like foreigner shares. Let’s do that for the German capital Berlin. For this purpose, we subset the district data and mask and crop data on foreigner shares from the German census.

berlin <-
  german_districts |> 
  dplyr::filter(AGS == "11000")

foreigners_berlin <-
  terra::rast("./data/z22/foreigners.tif") |> 
  terra::project("EPSG:3035") |> 
  terra::mask(terra::vect(berlin)) |> 
  terra::crop(berlin)

plot(foreigners_berlin)

It’s easy

Although we can identify some clusters using the raw data, some smoothing may be helpful. We can use the terra::focal() function to do that. It applies a moving window filter on all raster cells of a grid. We’ll have a more detailed look at this function in a minute.

foreigners_berlin_smooth <- 
  terra::focal(
    foreigners_berlin, 
    w = matrix(1, 5, 5), 
    fun = mean, 
    na.rm = TRUE
  )

plot(foreigners_berlin_smooth)

‘Real’ point pattern analysis

Usually, when we talk about heat maps, we mean analyzing point patterns and whether they spatially cluster. terra might not be your best choice regarding more elaborated techniques to estimate density kernels and distance base bandwidths between points to draw clusters. For this purpose, packages such as spatstat are better suited but require learning about other data structures. That said, densities are more advanced ways of counting things in raster grid cells, and we can mimic this behavior also with terra. So, let’s stick to this package and crop all of our points to the extent of Berlin.

points_berlin <- 
  readRDS(
    "../../data/synthetic_survey_coordinates.rds"
  ) |>  
  sf::st_crop(berlin)

plot(
  points_berlin["foreigner"], 
  col = c("black", "green")
)

Creating a raster density template

Next, we simply want to count points with the attribute foreigner = 1 in raster grid cells for our density estimation. However, our points are sparse compared to our comprehensive raster dataset. We cannot initially rely on 1 km² grid cells as with the Census grid data. But 5 km² may be a good compromise. Let’s do that!

raster_template <-
  points_berlin |> 
  dplyr::filter(foreigner == 1) |> 
  sf::st_bbox() |> 
  terra::ext() |> 
  terra::rast(resolution = 5000, crs = "EPSG:3035")

raster_template
class       : SpatRaster 
dimensions  : 7, 7, 1  (nrow, ncol, nlyr)
resolution  : 4857.143, 4857.143  (x, y)
extent      : 4534500, 4568500, 3255500, 3289500  (xmin, xmax, ymin, ymax)
coord. ref. : ETRS89-extended / LAEA Europe (EPSG:3035) 

Why the hassle?

You may wonder why we are doing that. The answer is simple: We want to count the number of points in each grid cell. We use the function terra::rasterize() as a simple technique.

points_berlin_density <- 
  terra::rasterize(
    points_berlin, 
    raster_template, 
    fun = "sum", 
    background = 0
  ) 

plot(points_berlin_density)

Again, why the hassle?

Now, this is not very pleasant. These data looks… not good. But fear not, working with raster data is powerful, as we now use a function you already know for projecting one CRS into another: terra::project(). This function can also be used to aggregate and disaggregate data based on the structure of another dataset. So, while we could not initially use 1 km² grid cells for our density ‘estimation’, we can reproject our 5 km² onto a 1 km² grid like our Census grid data. A bit of masking also helps get rid of cells that are not within the Berlin border.

points_berlin_density <-
  points_berlin_density |> 
  terra::project(foreigners_berlin) |> 
  terra::mask(foreigners_berlin)

plot(points_berlin_density)

Smoothing

We can also apply smoothing as with the population grid data.

terra::focal(
  foreigners_berlin, 
  w = matrix(1, 5, 5), 
  fun = mean, 
  na.rm = TRUE
) |> 
  plot()

focal(
  points_berlin_density, 
  w = matrix(1, 5, 5), 
  fun = mean,
  na.rm = TRUE
) |> 
  plot()

Playing with the smoothing

terra::focal(
  foreigners_berlin, 
  w = matrix(1, 3, 3), 
  fun = mean, 
  na.rm = TRUE
) |> 
  plot()

focal(
  points_berlin_density, 
  w = matrix(1, 3, 3), 
  fun = mean,
  na.rm = TRUE
) |> 
  plot()

Playing with the smoothing

terra::focal(
  foreigners_berlin, 
  w = matrix(1, 9, 9), 
  fun = mean, 
  na.rm = TRUE
) |> 
  plot()

focal(
  points_berlin_density, 
  w = matrix(1, 9, 9), 
  fun = mean,
  na.rm = TRUE
) |> 
  plot()

What is this argument w?

It’s magic. Just kidding, but it is indeed really powerful. It builds around the idea of connecting the value of a focal grid cell to the values of surrounding grid cells, as in the figure below. Hence, the name of the function terra::focal() where the argument is used.

It’s just a simple base R matrix

When using this matrix as input and applying the statistic fun = mean, we change the value of the focal grid cell to the mean of the values of itself and the 8 surrounding grid cells. That’s what we did before in one example.

matrix(1, 5, 5)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    1    1    1
[2,]    1    1    1    1    1
[3,]    1    1    1    1    1
[4,]    1    1    1    1    1
[5,]    1    1    1    1    1

But we can change that however we want:

weighted_w <- matrix(c(rep(.5, 12), 1, rep(.5, 12)), 5, 5)

weighted_w
     [,1] [,2] [,3] [,4] [,5]
[1,]  0.5  0.5  0.5  0.5  0.5
[2,]  0.5  0.5  0.5  0.5  0.5
[3,]  0.5  0.5  1.0  0.5  0.5
[4,]  0.5  0.5  0.5  0.5  0.5
[5,]  0.5  0.5  0.5  0.5  0.5

Applying it to the foreigners grid data

terra::focal(
  foreigners_berlin, 
  w = matrix(1, 5, 5), 
  fun = mean, 
  na.rm = TRUE
) |> 
  plot()

focal(
  foreigners_berlin, 
  w = weighted_w, 
  fun = mean,
  na.rm = TRUE
) |> 
  plot()

Real life example: Edges of immigrant rates

In ethnic diversity research, whether sudden changes in the neighborhood composition may increase the potential for conflicts between groups is a relevant question. Researchers use edge detection algorithms from image processing to investigate such changes spatially.

What is edge detection?

Edge detection identifies sudden color changes in an image to ‘draw’ borders of things in a picture. Here’s an example of the prominent Sobel filter.

Source: https://en.wikipedia.org/wiki/Sobel_operator

We can do that as well using a Sobel filter

R is good for math, right? While this is the formula for applying the Sobel filter to a raster image…

\[r_x = \begin{bmatrix}1 & 0 & -1 \\2 & 0 & -2 \\1 & 0 & -1\end{bmatrix} \times raster\_file \\r_y = \begin{bmatrix}1 & 2 & 1 \\0 & 0 & 0 \\-1 & -2 & -1\end{bmatrix}\times raster\_file \\r_{xy} = \sqrt{r_{x}^2 + r_{y}^2}\]

Implementation in R

…we can easily translate it to be used in terra::focal()1.

sobel <- function(r) {
  fy <- matrix(c(1, 0, -1, 2, 0, -2, 1, 0, -1), nrow = 3)
  fx <- matrix(c(-1, -2, -1, 0, 0, 0, 1, 2, 1) , nrow = 3)
  rx <- terra::focal(r, fx)
  ry <- terra::focal(r, fy)
  sqrt(rx^2 + ry^2)
}

foreigners_berlin_edges <- sobel(foreigners_berlin_smooth)

Comparison

We can now clearly display the edges of sudden changes in neighborhood composition within Berlin.

plot(foreigners_berlin_smooth)
plot(foreigners_berlin_edges)

Exercise 3_2: Extracting and Analyzing Raster Information

Conversion

Raster to points

raster_now_points <-
  foreigners_berlin |> 
  terra::as.points()

plot(raster_now_points)

Points to raster

raster_target_layer <- 
  terra::ext(raster_now_points) |> 
  terra::rast(res = 1000)

points_now_raster <- 
  raster_now_points |> 
  terra::rasterize(
    raster_target_layer, 
    field = "cat_0", 
    fun = "mean",
    background = 0
  )

plot(points_now_raster)

Raster to polygons

polygon_raster <-
  foreigners_berlin |>  
  terra::as.polygons() |> 
  sf::st_as_sf()

plot(polygon_raster)